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Evidence for a Common Evolutionary Origin of Coronavirus Spike 
Protein Receptor-Binding Subunits 
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Among different coronavirus genera, the receptor-binding SI subunits of their spike proteins differ in primary, secondary, and 
tertiary structures. This study identified shared structural topologies (connectivity of secondary structural elements) in SI do¬ 
mains of different coronavirus genera. The results suggest that coronavirus SI subunits share a common evolutionary origin but 
have attained diverse sequences and structures following extensive divergent evolution. The results also increase understanding 
of the structures and functions of coronavirus SI domains whose tertiary structures are currently unknown. 


T races of the origins of viruses and viral proteins are often 
masked or even erased by the high mutation rates, long evolu¬ 
tionary history, and many genetic tricks of the viruses. For viral 
proteins, evolutionary records are more likely to be conserved in 
their tertiary structures than in their gene sequences, due to the 
evolutionary pressure on these proteins to maintain their func¬ 
tions within a certain structural framework. Sometimes, however, 
evolutionary clues can be lost even in tertiary structures of viral 
proteins, presenting evolutionary conundrums. This study inves¬ 
tigated these conundrums surrounding coronavirus (CoV) spike 
proteins that have different tertiary structures. 

CoVs, a family of enveloped positive-stranded RNA viruses, 
can be divided into three major genera or groups, the alpha-CoVs 
(group 1), thebeta-CoVs (group 2), and the gamma-CoVs (group 
3) (5). The representative members in each genus are listed in Fig. 
1. A trimeric spike protein anchored on coronavirus envelopes 
mediates viral entry into host cells. It contains an ectodomain, a 
transmembrane anchor, and a short intracellular tail (Fig. 1A). 
During molecular maturation, the ecotodomain is often cleaved 
into a receptor-binding SI subunit and a membrane-fusion S2 
subunit. For cell entry, SI binds to a host receptor for viral attach¬ 
ment, and S2 undergoes dramatic structural changes to fuse the 
viral and host membranes. The sequences, structures, and 
membrane-fusion mechanisms of the S2 subunits are conserved 
among different coronavirus genera (10, 11). However, the SI 
subunits from different coronavirus genera share little or no sig¬ 
nificant sequence similarity (Fig. IB). Two independent domains 
have been identified in the SI subunits from different coronavirus 
genera, the N-terminal domain (NTD) and C-domain (Fig. 1A), 
both of which can bind host receptors and hence function as 
receptor-binding domains (RBDs) (4). Furthermore, coronavirus 
SI subunits recognize a variety of host receptors, including pro¬ 
teins and sugars. The diversities in their SI sequences and receptor 
usage present evolutionary puzzles surrounding coronavirus 
spike proteins (5). 

To date, three crystal structures have been available 
for coronavirus SI domains: NTD of beta-genus mouse hepa¬ 
titis coronavirus (MHV) and the C-domains of alpha-genus 
NL63 coronavirus (NL63-CoV) and beta-genus severe acute 
respiratory syndrome coronavirus (SARS-CoV) (3, 4, 9). Ac¬ 
cording to the results of searches performed with the DALI 
protein structure database search server (2), whereas MHV 
NTD and human galectins share the same fold, the SARS-CoV 


and NL63-CoV C-domains each represent a novel fold (Fig. 2A 
and B). Interestingly, despite their different structures, NL63- 
CoV and SARS-CoV C-domains bind to overlapping regions 
on their common receptor, human angiotensin-converting en¬ 
zyme 2 (ACE2) (Fig. 3A and B) (9). We previously hypothe¬ 
sized that SARS-CoV and NL63-CoV C-domains diverged 
from a common ancestor into different structures and then 
converged functionally to recognize the same ACE2 receptor 
(8, 9). In this report, we present evidence to support this hy¬ 
pothesis and expand the evolutionary discussion to include all 
three coronavirus genera. 

SARS-CoV and NL63-CoV C-domains differ in primary, sec¬ 
ondary, and tertiary structures. The sequence similarity between 
their SI subunits is 10%, not significantly higher than the similar¬ 
ity between two random protein sequences (Fig. IB). The core 
structure of the NL63-CoV C-domain is a /3-sandwich consisting 
of two /3-sheet layers that stack against each other. The two 
/3-sheet layers consist of five strands (/35-/37-/38-/36-/33) and three 
strands (/34-/31-/32), respectively (Fig. 2A and B). On the other 
hand, the core structure of the SARS-CoV C-domain is a single¬ 
layer five-strand /3-sheet (/35-/37-/38-/36-/33) with two ct-helices 
(a:4-aT) stacked against it (Fig. 2A and B). The DALI Z-score 
determined in comparisons of the SARS-CoV and NL63-CoV 
C-domains suggests no significant similarity in their tertiary 
structures. 

Surprisingly, despite their different primary, secondary, and 
tertiary structures, SARS-CoV and NL63-CoV C-domains share 
related structural topologies (i.e., connectivity of secondary struc¬ 
tural elements) (Fig. 2C and D). Close inspections of NL63-CoV 
and SARS-CoV C-domains show that the following structural dif¬ 
ferences exist between the two proteins. First, strands /3-4 and /3-1 
in NL63-CoV become helices a4 and ail in SARS-CoV, respec¬ 
tively. Second, strand /3-2 in NL63-CoV is missing in SARS-CoV. 
Third, strands /3-6 and /3-3 are located at the center of the /3-sheet 
in SARS-CoV but have been moved to one side of the /3-sheet in 
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FIG 1 Spike proteins from three coronavirus genera or groups. (A) Schematic 
representation of coronavirus spike proteins. NTD, N-terminal domain; FP, 
fusion peptide; HR-N, heptad repeat N; HR-C, heptad repeat C; TM, trans¬ 
membrane anchor; IC, intracellular tail. (B) Sequence similarities among SI 
subunits from representative CoVs (coronaviruses). NL63, human NL63 
coronavirus strain Amsterdam I; TGEV, porcine transmissible gastroenteritis 
virus strain Purdue; MHV, mouse hepatitis coronavirus strain A59; BCoV, 
bovine coronavirus strain ENT; SARS, SARS coronavirus strain Tor2; IBV, 
avian infectious bronchitis virus strain M41; SARS pol, SARS polymerase 
strain Tor02; SD, standard deviation. Alpha-, beta-, and gamma-CoYs can also 
be referred to as group 1, group 2, and group 3 coronaviruses, respectively. 
Sequence similarities were calculated using ClusterW (1). 
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FIG 2 Structures of alpha-coronavirus NL63-CoV and beta-coronavirus 
SARS-CoV C-domains. (A) Crystal structure of NL63-CoV C-domain (PDB 
identification no. 3KBH). (B) Crystal structure of SARS-CoV C-domain (PDB 
identification no. 2AJF). Structural similarity Z-scores were calculated using 
the DALI server (2). (C) Topology of NL63-CoV C-domain. (D) Topology of 
SARS-CoV C-domain. RBM, receptor-binding motif. 



FIG 3 Receptor binding by alpha-coronavirus NL63-CoV and beta- 
coronavirus SARS-CoV C-domains. (A) Crystal structure of NL63-CoV 
C-domain complexed with human ACE2 (PDB identification no. 3KBH). (B) 
Crystal structure of SARS-CoV C-domain complexed with human ACE2 
(PDB identification no. 2AJF). (C) A hydrophobic tunnel structure at the 
NL63-CoV/ACE2 interface, comprising residues Tyr41 and Asp37 from ACE2 
and Ser535 and Tyr498 from the NL63-CoV C-domain. (D) A hydrophobic 
tunnel structure at the SARS-CoV/ACE2 interface, comprising residues Tyr41 
and Asp37 from ACE2 and Thr487 and Tyr491 from the SARS-CoV 
C-domain. The hydrophobic tunnels in panels C and D bury a salt bridge 
between Lys353 and Asp38 from ACE2. ACE2, angiotensin-converting en¬ 
zyme 2; VBM, virus-binding motif. 


NL63-CoV. Taking these structural differences into account, vir¬ 
tually all of the secondary structural elements in the two 
C-domains are connected in the same order from the N terminus 
to C terminus. The shared structural topology in their C-domains 
strongly suggests that the SI subunits of NL63-CoV and SARS- 
CoV have the same evolutionary origin and that the current struc¬ 
tural differences between them result from extensive divergent 
evolution. 

Despite their related structural topologies, the NL63-CoV and 
SARS-CoV C-domains bind to their common ACE2 receptor by 
the use of different molecular mechanisms. The SARS-CoV 
C-domain contains a single long continuous subdomain that 
binds ACE2 (Fig. 2B and D). The subdomain has been termed the 
receptor-binding motif, or RBM. The NL63-CoV C-domain con¬ 
tains three short and discontinuous ACE2-binding loops that are 
termed RBM 1, RBM2, and RBM3 (Fig. 2A and C). The SARS-CoV 
RBM is topologically equivalent to NL63-CoV RBM3, because 
they both connect strands /3-7 and /3-8. However, compared with 
NL63-CoV RBM3, the SARS-CoV RBM is much longer and 
makes many more contacts with ACE2. On the other hand, al¬ 
though the NL63-CoV and SARS-CoV C-domains both bind to 
the same virus-binding motifs (VBMs) on ACE2, ACE2 is bound 
in different orientations when the two C-domains are structurally 
aligned (Fig. 3A and B). Furthermore, although the NF63-CoV 
and SARS-CoV C-domains both form an energetically stabilizing 
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FIG 4 Summary of structure, function, and evolution of coronavirus SI do¬ 
mains. APN, aminopeptidase N; CEACAM1, carcinoembryonic antigen- 
related cell adhesion molecule. 


hydrophobic tunnel structure with ACE2, RBM1 and RBM2 from 
the NL63-CoV C-domain and RBM from the SARS-CoV 
C-domain are involved in these interactions but not the NF63- 
CoV RBM3 that is topologically equivalent to the SARS-CoV 
RBM (Fig. 3C and D). Overall, because of the distinct molecular 
mechanisms that the NL63-CoV and SARS-CoV C-domains use 
to recognize ACE2, ACE2 binding is likely the outcome of conver¬ 
gent evolution of the two C-domains recognizing a common 
virus-binding hot spot on ACE2. 

This study provides evidence, for the first time, that the SI 
subunits of different coronavirus genera share the same evolu¬ 
tionary origin but have undergone extensive divergent evolution. 
It suggests that the two SI domains, the NTD and C-domain, from 
different coronavirus genera have related structural topologies 
(Fig. 4). To date, the tertiary structures of the alpha-CoV and 
gamma-CoV NTDs and gamma-CoV C-domains have remained 
unknown. Here we can infer that the alpha-CoV and gamma-CoV 
NTDs likely share similar structural topologies with the beta-CoV 
MHV NTD, which is believed to have originated from a host ga- 
lectin but to have later evolved a carcinoembryonic antigen- 
related cell adhesion molecule (CEACAM1)-binding function (4). 
Indeed, sugar-binding functions have been preserved in all three 
major coronavirus genera (6, 7), suggesting that the galectin fold 
of the beta-CoV MHV NTD also exists in the alpha-CoV and 
gamma-CoV NTDs. In addition, we can infer that gamma-CoV 
C-domains likely share similar structural topologies with alpha- 
coronavirus NL63-CoV and beta-coronavirus SARS-CoV 
C-domains. In fact, the sequence similarity between the NL63- 
CoV SI and gamma-CoV IBV SI is higher than that between the 


NL63-CoV SI and SARS-CoV SI, whose C-domains have related 
structural topologies (Fig. IB). The alpha-CoV C-domains may 
have undergone divergent evolution to acquire APN- or ACE2- 
binding functions, whereas alpha-coronavirus NF63-CoV and 
beta-coronavirus SARS-CoV C-domains may have undergone 
first divergent evolution and then convergent evolution to both 
acquire ACE2-binding functions. During the long and compli¬ 
cated evolutionary history of coronaviruses, it is likely that sugars 
have been serving as the primordial and fallback receptors for 
coronaviruses, allowing coronaviruses to search for additional 
and high-affinity protein receptors. Overall, this study has en¬ 
hanced our understanding of the origin, evolution, structures, and 
functions of coronavirus spike proteins. Future structural deter¬ 
minations of coronavirus SI domains whose atomic structures are 
currently unknown will further clarify the curious evolutionary 
relationships among coronavirus spike proteins. 
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